fix: reorder BackFill operations to prevent dirty_data_key_count_ underflow#465
fix: reorder BackFill operations to prevent dirty_data_key_count_ underflow#465githubzilla wants to merge 1 commit intomainfrom
Conversation
…erflow BackFill() in TemplateCcMap called SetCkptTs() (which sets the flush bit) before SetCommitTsPayloadStatus() (which overwrites the entire commit_ts_and_status_ field, clearing the flush bit). This left backfilled entries dirty without a corresponding dirty-count increment. When those entries were later flushed by checkpoint, the decrement caused dirty_data_key_count_ to underflow, triggering the assertion in AdjustDataKeyStats. Fix: 1. template_cc_map.h BackFill(): call SetCommitTsPayloadStatus() before SetCkptTs() so the flush bit set by SetCkptTs() is preserved. Add OnCommittedUpdate() to reconcile dirty-key stats. 2. template_cc_map.h RemoteReadOutsideCc(): add OnCommittedUpdate() for consistency with other backfill paths. 3. cc_req_misc.cpp UpdateCceCkptTsCc::Execute(): relax 4 assertions that checked !IsPersistent() — BackFill can legitimately mark entries as persistent before the checkpoint callback fires. The existing was_dirty/OnEntryFlushed logic already handles this case correctly.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR reorders flush-related operations in the template concurrency control map, ensuring payload updates occur before checkpoint timestamp settings and checkpoint flushing. Assertions in the commit-timestamp callback are relaxed to permit concurrent BackFill operations that may have already marked entries as flushed. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…erflow Update data_substrate submodule to include the fix for an intermittent assertion failure in AdjustDataKeyStats where dirty_data_key_count_ underflows during cluster scale-out. See eloqdata/tx_service#465 for details.
…erflow Update data_substrate submodule to include the fix for an intermittent assertion failure in AdjustDataKeyStats where dirty_data_key_count_ underflows during cluster scale-out. See eloqdata/tx_service#465 for details.
Summary
Fix an intermittent assertion failure in
AdjustDataKeyStats(cc_shard.cpp:428) wheredirty_data_key_count_underflows during cluster scale-out (test_add_node_with_double_write).Root Cause
BackFill()inTemplateCcMap(template_cc_map.h) called operations in the wrong order:SetCkptTs(commit_ts)— sets the flush bit (marks entry as persistent)OnFlushed()— runs dirty-key accounting (sees entry as persistent, no dirty increment)SetCommitTsPayloadStatus(commit_ts, status)— overwritescommit_ts_and_status_entirely, clearing the flush bitAfter step 3, the entry is left dirty but
dirty_data_key_count_was never incremented for it. When checkpoint later flushes this entry and decrements the counter, it underflows — triggering the assertion.Changes
1.
template_cc_map.h—BackFill()(primary fix)Reordered operations so
SetCommitTsPayloadStatus()runs beforeSetCkptTs():SetCommitTsPayloadStatus(commit_ts, status)— updates the payload and commit timestampSetCkptTs(commit_ts)— sets the flush bit after, so it is preservedOnFlushed()+OnCommittedUpdate()— reconciles dirty-key stats with the entry in its correct final stateThis matches the correct ordering already used in
ObjectCcMap::BackFill().2.
template_cc_map.h—RemoteReadOutsideCc()Added
OnCommittedUpdate()call for consistency with other backfill paths. Defensive fix to ensure dirty-key stats are properly reconciled.3.
cc_req_misc.cpp—UpdateCceCkptTsCc::Execute()Relaxed 4 assertions (versioned/non-versioned × range/non-range) from:
to:
With the
BackFillfix, backfilled entries are now correctly marked persistent. But a checkpoint callback that was already scheduled beforeBackFillran can still arrive and find the entry already persistent. This is benign — the existingwas_dirty/OnEntryFlushedlogic handles it correctly (if the entry is already persistent,was_dirtyis false and no double-decrement occurs).Testing
Ran
test_add_node_with_double_write(cluster scale-out test) 6 consecutive times — all passed with zero assertion failures in any node logs:Before the fix, this test would intermittently crash with the
dirty_data_key_count_underflow assertion.Summary by CodeRabbit